Creating Good Graphics

Susan Vanderplas

2024-06-26

Identifying the Problem

Pie Chart Poll Results

A newspaper clipping from the Scottsbluff Star-Herald, showing a pie chart of support for the marijuana legalization inititive in Nebraska, from Tuesday, March 16, 2021. The yes slice (which seems to be about 56% of the area) is labeled 44%, while the no slice (which seems to be about 44% of the area) is labeled 56%.
Figure 1: Scottsbluff Star Herald Reader poll. Source
  • What is wrong with this chart?

  • Do you think it might be misleading? If so, how?

  • Do you think the mistakes were intentional?

High Support

A CBS News pie chart of americans who have tried marijuana, showing 51% today, 43% last year, and 34% in 1997. The chyron below the image says 'High support for legalizing marijuana. More than half of Americans say they've tried pot'

Figure 2: Source

  • What is wrong with this chart?

  • What would you change to more accurately represent the data?

  • Do you think the mistakes were intentional?

Gas Prices

Two bar charts showing the % increase in petrol and diesel prices in India (2018). The first chart shows an increase of 20.5% from 2004 to 2009 (real values of 33.71 to 40.62), an increase of 75.0% from 2009 to 2014 (real values of 40.62 to 71.41), and a 13% decrease from 2014 to 2018 (real values of 71.41 to 80.73). The last bar and arrow are shown in yellow, while the first three bars are shown in green. The second chart shows diesel prices, with real values of 21.74, 30.86, 56.71, and 72.83 in 2004, 2009, 2014, and 2018, respectively. Arrows show the change between each price set, with a 42% increase from 2004 to 2009, an 83.7% increase from 2009 to 2014, and a 28% decrease from 2014 to 2018, which is highlighted in yellow. At the bottom of each chart, an image of Narendra Modi is shown.

Figure 3: Gas and Diesel price changes in India (2004 - 2018).

  • What is wrong with this?

  • What design choices contribute to the problems?

  • Do you think this was intentionally designed to be misleading? Why or why not?

Information Overload

  • What problems do you have reading this chart?

  • Can you compare the quantities of all 6 variables shown? Why or why not?

(Yes, the blog this chart is taken from is satirical. This is not a recommended graphical form.)

Designing Good Charts

Why Graphics Matter

Graphics are a form of external cognition that allow us to think about the data rather than the chart.

That is, graphics are a tool to make it easier for us to think about what the data means.

Good graphics take advantage of how the brain works, leveraging

  • preattentive processing

  • perceptual grouping

  • awareness of visual limitations

Good graphics also depend on the data: the chart type should be chosen based on the types of variables you want to display, the amount of data you have, and the results you want to highlight.

Example: Hertzsprung Russell Diagram

A scatter plot showing the color index of a star on the x-axis and the absolute magnitude (brightness) of the star on the y-axis. Points are colored by spectral class, which varies from blue to white to yellow to red as the color index increases and the star's temperature decreases. Points are primarily located along a downward-sloping line from the top left to the bottom right, which is labeled the 'main sequence'. There is another set of points which diverges from the main sequence and extends out horizontally in the middle of the graph; these are labeled 'giants', and a few outliers that are above the giant cluster are labeled 'supergiants'. Below the main sequence stars, there are outliers which are labeled 'dwarfs'.

The Hertzsprung Russell diagram. Discovered independently by Ejnar Hertzsprung (1873–1967) and Henry Norris Russell (1877–1957). The diagram plots the color index of the star against the brightness (absolute magnitude) of the star. As a result, it is possible to discern that these two variables are related and change together over a star’s life cycle: a hypothesis that only came to be because of this chart.

I’ve used data from the HYG Database to generate this chart. Only stars within 500 AU are shown.

Preattentive Perception

  • Occurs automatically (no effort)

  • Color, shape, angle

  • Combinations of preattentive features require attention

    • Double-encoding (using multiple features for the same variable) is ok
(a) Shape
(b) Color
Figure 5: Two scatterplots with one point that is different. Can you easily spot the different point?
(a) Shape and Color (dual encoded)
(b) Shape and Color (different variables)
Figure 6: Two scatterplots. Can you easily spot the different point(s)?